Profiling-Assisted Decoupled Access-Execute

نویسندگان

  • Jonatan Waern
  • Per Ekemark
  • Konstantinos Koukos
  • Stefanos Kaxiras
  • Alexandra Jimborean
چکیده

As energy efficiency became a critical factor in the embedded systems domain, dynamic voltage and frequency scaling (DVFS) techniques have emerged as means to control the system’s power and energy efficiency. Additionally, due to the compact design, thermal issues become prominent. State of the art work promotes software decoupled accessexecution (DAE) that statically generates code amenable to DVFS techniques. The compiler builds memory-bound access phases, designed to prefetch data in the cache at low frequency, and compute-bound phases, that consume the data and perform computations at high frequency. This work investigates techniques to find the optimal balance between lightweight and efficient access phases. A profiling step guides the selection of loads to be prefetched in the access phase. For applications whose behavior vary significantly with respect to the input data, the profiling can be performed online, accompanied by just-in-time compilation. We evaluated the benefits in energy efficiency and performance for both static and dynamic code generation and showed that precise prefetching of critical loads can result in 20% energy improvements, on average. DAE is particularly beneficial for embedded systems as by alternating access phases (executed at low frequency) and execute phases (at high frequency) DAE proactively reduces the temperature and therefore prevents thermal emergencies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Latency Hiding Effectiveness of Decoupled Access/Execute Processors

Several studies have demonstrated that out-of-order execution processors may not be the most adequate organization for wide issue processors due to the increasing penalties that wire delays will cause in the issue logic. The main target of out-of-order execution is to hide functional unit latencies and memory latency. However, the former can be quite effectively handled at compile time and this...

متن کامل

Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models

This work demonstrates the potential of hardware and software optimization to improve the effectiveness of dynamic voltage and frequency scaling (DVFS). For software, we decouple data prefetch (access) and computation (execute) to enable optimal DVFS selection for each phase. For hardware, we use measurements from state-of-the-art multicore processors to accurately model the potential of per-co...

متن کامل

Decoupled Access/Execute Metaprogramming for GPU-Accelerated Systems

We describe the evaluation of several implementations of a simple image processing filter on an NVIDIA GTX 280 card. Our experimental results show that performance depends significantly on low-level details such as data layout and iteration space mapping which complicate code development and maintenance. We propose extending a CUDA or OpenCL like model with decoupled Access/Execute (“Æcute” [1]...

متن کامل

Code Partitioning in Decoupled Compilers

Decoupled access/execute architectures seek to maximize performance by dividing a given program into two separate instruction streams and executing the streams on independent cooperating processors. The instruction streams consist of those instructions involved in generating memory accesses (the Access stream) and those that consume the data (the Execute stream). If the processor running the ac...

متن کامل

Design and VLSI implementation of an access processor for a decoupled architecture

Decoupled computer architectures provide high scalar performance by exploiting the ne{grained parallelism existing between the access and execute functions in a computer program. These architectures employ an access processor to perform data fetch ahead of demand by the execute process. Some of the decoupled archi-tectures employ identical access and execute processors, but special processors t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1601.01722  شماره 

صفحات  -

تاریخ انتشار 2016